首页> 外文OA文献 >Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis.
【2h】

Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis.

机译:基于稳定性的聚类有效性和模型选择统计方法的算法范例,并应用于微阵列数据分析。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The advent of high throughput technologies, in particular microarrays, for biological research has revived interest in clustering, resulting in a plethora of new clustering algorithms. However, model selection, i.e., the identification of the correct number of clusters in a dataset, has received relatively little attention. Indeed, although central for statistics, its difficulty is also well known. Fortunately, a few novel techniques for model selection, representing a sharp departure from previous ones in statistics, have been proposed and gained prominence for microarray data analysis. Among those, the stability-based methods are the most robust and best performing in terms of prediction, but the slowest in terms of time. It is very unfortunate that as fascinating and classic an area of statistics as model selection, with important practical applications, has received very little attention in terms of algorithmic design and engineering. In this paper, in order to partially fill this gap, we make the following contributions: (A) the first general algorithmic paradigm for stability-based methods for model selection; (B) reductions showing that all of the known methods in this class are an instance of the proposed paradigm; (C) a novel algorithmic paradigm for the class of stability-based methods for cluster validity, i.e., methods assessing how statistically significant is a given clustering solution; (D) a general algorithmic paradigm that describes heuristic and very effective speed-ups known in the literature for stability-based model selection methods.\udSince the performance evaluation of model selection algorithms is mainly experimen- tal, we offer, for completeness and without even attempting to be exhaustive, a represen- tative synopsis of known experimental benchmarking results that highlight the ability of stability-based methods for model selection and the computational resources that they re- quire for the task. As a whole, the contributions of this paper generalize in several respects reference methodologies in statistics and show that algorithmic approaches can yield deep methodological insights into this area, in addition to practical computational procedures.
机译:用于生物学研究的高通量技术尤其是微阵列的出现重新引起了人们对聚类的兴趣,从而产生了许多新的聚类算法。然而,模型选择,即,识别数据集中正确数目的聚类,受到的关注相对较少。确实,尽管统计很重要,但其难度也是众所周知的。幸运的是,已经提出了一些新的模型选择技术,这些技术与统计学中的先前技术有很大的出入,并在微阵列数据分析中获得了突出的应用。在这些方法中,就预测而言,基于稳定性的方法最可靠,性能最佳,但在时间方面则最慢。不幸的是,作为模型选择的引人入胜且经典的统计领域,具有重要的实际应用,在算法设计和工程方面很少受到关注。在本文中,为了部分填补这一空白,我们做出了以下贡献:(A)第一个基于稳定性的模型选择方法的通用算法范式; (B)归纳表明此类中的所有已知方法都是所提议范式的一个实例; (C)一种新的算法范式,用于基于有效性的聚类有效性方法分类,即评估给定聚类解决方案在统计上的重要性的方法; (D)描述基于稳定性的模型选择方法的文献中已知的启发式且非常有效的提速的通用算法范式。\ ud由于模型选择算法的性能评估主要是实验性的,因此我们提供了完整性和无模型性甚至试图穷举,已知实验基准测试结果的代表提要,强调了基于稳定性的模型选择方法和他们完成任务所需的计算资源的能力。总体而言,本文的贡献概括了统计学中的参考方法,并显示了除实用的计算程序外,算法方法还可以对这一领域产生深刻的方法学见解。

著录项

  • 作者

    Giancarlo, R; Utro, F;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号